Boosting for Chinese Named Entity Recognition

نویسندگان

  • Xiaofeng Yu
  • Marine Carpuat
  • Dekai Wu
چکیده

We report an experiment in which a highperformance boosting based NER model originally designed for multiple European languages is instead applied to the Chinese named entity recognition task of the third SIGHAN Chinese language processing bakeoff. Using a simple characterbased model along with a set of features that are easily obtained from the Chinese input strings, the system described employs boosting, a promising and theoretically well-founded machine learning method to combine a set of weak classifiers together into a final system. Even though we did no other Chinese-specific tuning, and used only one-third of the MSRA and CityU corpora to train the system, reasonable results are obtained. Our evaluation results show that 75.07 and 80.51 overall F-measures were obtained on MSRA and CityU test sets respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Named Entity Recognition with Cascaded Hybrid Model

We propose a high-performance cascaded hybrid model for Chinese NER. Firstly, we use Boosting, a standard and theoretically wellfounded machine learning method to combine a set of weak classifiers together into a base system. Secondly, we introduce various types of heuristic human knowledge into Markov Logic Networks (MLNs), an effective combination of first-order logic and probabilistic graphi...

متن کامل

Chinese Named Entity Recognition Based on Hierarchical Hybrid Model

Chinese named entity recognition is a challenging, difficult, yet important task in natural language processing. This paper presents a novel approach based on a hierarchical hybrid model to recognize Chinese named entities. Three mutually dependent stages-boosting, Markov Logic Networks (MLNs) based recognition, and abbreviation detection are integrated in the model. AdaBoost algorithm is utili...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006